منابع مشابه
Escaping Saddles with Stochastic Gradients
We analyze the variance of stochastic gradients along negative curvature directions in certain nonconvex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that contrary to the case of isotropic noise this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensiona...
متن کاملEstimating or Propagating Gradients Through Stochastic Neurons
Stochastic neurons can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic neurons, i.e., can we “back-propagate” through these stochastic neurons? We examine this question, existing approaches, and present two novel families of solutions, applic...
متن کاملRevisiting stochastic off-policy action-value gradients
Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural ...
متن کاملTriply Stochastic Gradients on Multiple Kernel Learning
Multiple Kernel Learning (MKL) is highly useful for learning complex data with multiple cues or representations. However, MKL is known to have poor scalability because of the expensive kernel computation. Dai et al (2014) proposed to use a doubly Stochastic Gradient Descent algorithm (doubly SGD) to greatly improve the scalability of kernel methods. However, the algorithm is not suitable for MK...
متن کاملStochastic Gradient MCMC with Stale Gradients
Stochastic gradient MCMC (SG-MCMC) has played an important role in largescale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is becoming increasingly popular to employ distributed systems, where stochastic gradients are computed based on some outdated parameters, yielding what are termed stale gradients. While stale gradients could...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Interdisciplinary Information Sciences
سال: 2009
ISSN: 1347-6157,1340-9050
DOI: 10.4036/iis.2009.345